Symbolic Compositional Verification by Learning Assumptions
نویسندگان
چکیده
The verification problem for a system consisting of components can be decomposed into simpler subproblems for the components using assume-guarantee reasoning. However, such compositional reasoning requires user guidance to identify appropriate assumptions for components. In this paper, we propose an automated solution for discovering assumptions based on the L* algorithm for active learning of regular languages. We present a symbolic implementation of the learning algorithm, and incorporate it in the model checker NuSMV. Our experiments demonstrate significant savings in the computational requirements of symbolic model checking. Comments From the 17th International Conference, CAV 2005, Edinburgh, Scotland, UK, July 6-10, 2005. This conference paper is available at ScholarlyCommons: http://repository.upenn.edu/cis_papers/186 Symboli Compositional Veri ation by Learning Assumptions ? Rajeev Alur1, P. Madhusudan2, and Wonhong Nam1 1 University of Pennsylvania 2 University of Illinois at Urbana-Champaign alur is.upenn.edu, madhu s.uiu .edu, wnam is.upenn.edu Abstra t. The veri ation problem for a system onsisting of omponents an be de omposed into simpler subproblems for the omponents using assume-guarantee reasoning. However, su h ompositional reasoning requires user guidan e to identify appropriate assumptions for omponents. In this paper, we propose an automated solution for dis overing assumptions based on the L algorithm for a tive learning of regular languages. We present a symboli implementation of the learning algorithm, and in orporate it in the model he ker NuSMV. Our experiments demonstrate signi ant savings in the omputational requirements of symboli model he king. 1 Introdu tion In spite of impressive progress in heuristi s for sear hing the rea hable statespa e of system models, s alability still remains a hallenge. Compositional veri ation te hniques address this hallenge by a \divide and onquer" strategy aimed at exploiting the modular stru ture naturally present in system designs. One su h prominent te hnique is the assume-guarantee rule: to verify that a state property ' is an invariant of a system M omposed of two modules M1 and M2, it suÆ es to nd an abstra t module A su h that (1) the omposition of M1 and A satis es the invariant ', and (2) the module M2 is a re nement of A. Here, A an be viewed as an assumption on the environment of M1 for it to satisfy the property '. If we an nd su h an assumption A that is signi antly smaller than M2, then we an verify the requirements (1) and (2) using automated sear h te hniques without having to exploreM . In this paper, we propose an approa h to nd the desired assumption A automati ally in the ontext of symboli state-spa e exploration. If M1 ommuni ates with M2 via a set X of ommon boolean variables, then the assumption A an be viewed as a language over the alphabet 2X . We ompute this assumption using the L algorithm for learning a regular language using membership and equivalen e queries [6, 21℄. The learning-based approa h produ es a minimal DFA, and the number of queries is only polynomial in ? This resear h was partially supported by ARO grant DAAD19-01-1-0473, and NSF grants ITR/SY 0121431 and CCR0306382. the size of the output automaton. The membership query is to test whether a given sequen e over the ommuni ation variables belongs to the desired assumption. We implement this as a symboli invariant veri ation query that he ks whether the module M1 omposed with the sequen e satis es ' [16℄. For an equivalen e query, given a urrent onje ture assumption A, we rst test whetherM1 omposed with A satis es ' using symboli state-spa e exploration. If not, the ounter-example provided by the model he ker is used by the learning algorithm to revise A. Otherwise, we test if M2 re nes A, whi h is feasible sin e A is represented as a DFA. If the re nement test su eeds, we an on lude that M satis es the invariant, otherwise the model he ker gives a sequen e allowed by M2, but ruled out by A. We then he k if the module M1 stays safe when exe uted a ording to : if so, is used as a ounter-example by the learning algorithm to adjust A, and otherwise, is a witness to the fa t that the original model M does not satisfy '. While the standard L algorithm is designed to learn a parti ular language, and the desired assumption A belongs to a lass of languages ontaining all languages that satisfy the two requirements of the assume-guarantee rule, we show that the above strategy works orre tly. The learning-based approa h to automati generation of assumptions is appealing as it builds the assumption in rementally guided by the modelhe king queries, and if it en ounters an assumption that has a small representation as a minimal DFA, the algorithm will stop and use it to prove the property. In our ontext, the size of the alphabet itself grows exponentially with the number of ommuni ation variables. Consequently, we propose a symboli implementation of the L algorithm where the required data stru tures for representing membership information and the assumption automaton are maintained ompa tly using ordered BDDs [9℄ for pro essing the ommuni ation variables. For evaluating the proposed approa h, we modi ed the state-of-the-art symboli model he ker NuSMV [10℄. In Se tion 5, we report on a few examples where the original models ontain around 100 variables, and the omputational requirements of NuSMV are signi ant. The only manual step in the urrent prototype involves spe ifying the synta ti de omposition of the model M into modules M1 and M2. While the proposed ompositional approa h does not always lead to improvement (this an happen when no \good" assumption exists for the hosen de omposition into modules M1 and M2), dramati gains are observed in some ases redu ing either the required time or memory by one or two orders of magnitude, or onverting infeasible problems into feasible ones. Finally, it is worth pointing out that, while our prototype uses BDD-based state-spa e exploration, the approa h an easily be adopted to permit other model he king strategies su h as SAT-based model he king [8, 18℄ and ounterexample guided abstra tion re nement [15, 11℄. Related Work Compositional reasoning using assume-guarantee rules has a long history in the formal veri ation literature [22, 13, 1, 4, 17, 14, 19℄. While su h reasoning is supported by some tools (e.g. Mo ha [5℄), the hallenging task of nding the appropriate assumptions is typi ally left to the user and only a few attempts have been made to automate the assumption generation (in [3℄, the authors present some heuristi s for automati ally onstru ting assumptions using game-theoreti te hniques). Our work is inspired by the re ent series of papers by the resear hers at NASA Ames on ompositional veri ation using learning [12, 7℄. Compared to these papers, we believe that our work makes three ontributions. First, we present a symboli implementation of the learning algorithm, and this is essential sin e the alphabet is exponential in the number of ommuni ation variables. Se ond, we address and explain expli itly how the L algorithm designed to learn an unknown, but xed, language is adapted to learn some assumption from a lass of orre t assumption languages. Finally, we demonstrate the bene ts of the method by in orporating it in a state-of-the-art publi ly available symboli model he ker. It is worth noting that re ently the L algorithm has found appli ations in formal veri ation besides automating assume-guarantee reasoning: our software veri ation proje t JIST uses predi ate abstra tion and learning to synthesize (dynami ) interfa es for Java lasses [2℄; [23℄ uses learning to ompute the set of rea hable states for verifying in nite-state systems; while [20℄ uses learning for bla k box he king , that is, verifying properties of partially spe i ed implementations. 2 Symboli modules In this se tion, we formalize the notion of a symboli module, the notion of omposition of modules and explain the assume-guarantee rule we use in this paper. Symboli modules In the following, for any set of variables X, we will denote the set of primed variables of X as X 0 = fx0 j x 2 Xg. A predi ate ' over X is a boolean formula over X, and for a valuation s for variables in X, we write '(s) to mean that s satis es the formula '. A symboli module is a tuple M(X;XI ; XO; Init ; T ) with the following omponents: { X is a nite set of boolean variables ontrolled by the module, { XI is a nite set of boolean input variables that the module reads from its environment; XI is disjoint from X, { XO X is a nite set of boolean output variables that are observable to the environment of M , { Init(X) is an initial state predi ate over X, { T (X;XI ; X 0) is a transition predi ate over X [XI [X 0 where X 0 represents the variables en oding the su essor state. Let XIO = XI [XO denote the set of ommuni ation variables. A state s of M is a valuation of the variables in X; i.e. s : X ! ftrue; falseg. Let S denote the set of all states of M . An input state sI is a valuation of the input variables XI and an output state sO is a valuation of XO. Let SI and SO denote the set of input states and output states, respe tively. Also, SIO = SI SO. For a state s over a set X of variables, let s[Y ℄, where Y X denote the valuation over Y obtained by restri ting s to Y . The semanti s of a module is de ned in terms of the set of runs it exhibits. A run ofM is a sequen e s0; s1; , where ea h si is a state over X[XI , su h that Init(s0[X℄) holds, and for every i 0, T (si[X℄; si[XI ℄; s0i+1[X 0℄) holds (where s0i+1(x0) = si+1(x), for every x 2 X). For a moduleM(X;XI ; XO; Init ; T ) and a safety property '(XIO), whi h is a boolean formula over XIO , we de ne M j= ' if, for every run s0; s1; , for every i 0, '(si) holds. Given a run s0; s1; of M , the tra e of M is a sequen e s0[XIO ℄; s1[XIO ℄; of input and output states. Let us denote the set of all the tra es of M as L(M). Given two modules M1 = (X1; XI ; XO; Init1; T1) and M2 = (X2; XI ; XO; Init2; T2) that have the same input and output variables, we say M1 is a re nement of M2, denoted M1 vM2, if L(M1) L(M2). Composition of modules The syn hronous omposition operator k is a ommutative and asso iative operator that omposes modules. Given two modules M1 = (X1; XI 1 ; XO 1 ; Init1; T1) andM2 = (X2; XI 2 ; XO 2 ; Init2; T2), with X1\X2 = ;, M1kM2 = (X;XI ; XO; Init ; T ) is a module where: { X = X1 [X2, XI = (XI 1 [XI 2 ) n (XO 1 ℄XO 2 ), XO = XO 1 ℄XO 2 , { Init(X) = Init1(X1) ^ Init2(X2), { T (X;XI ; X 0) = T1(X1; XI 1 ; X 0 1) ^ T2(X2; XI 2 ; X 0 2). We an now de ne the modelhe king problem we onsider in this paper: Given modules M1 = (X1; XI 1 ; XO 1 ; Init1; T1) and M2 = (X2; XI 2 ; XO 2 ; Init2; T2), with X1 \ X2 = ;, XI 1 = XO 2 and XO 1 = XI 2 (let XIO = XIO 1 = XIO 2 ), and a safety property '(XIO), does (M1kM2) j= '? Note that we are assuming that the safety property ' is a predi ate over the ommon ommuni ation variables XIO . This is not a restri tion: to he k a property that refers to private variables of the modules, we an simply de lare them to be outputs. Assume-guarantee rule We use the following assume-guarantee rule to prove that a safety property ' holds for a module M =M1kM2. In the rule below, A is a module that has the same input and output variables as M2: M1kA j= ' M2 v A M1kM2 j= ' The rule above says that if there exists (some) module A su h that the omposition ofM1 and A is safe (i.e. satis es the property ') andM2 re nes A, then M1jjM2 satis es '. We an view su h an A as an adequate assumption between M1 and M2: it is an abstra tion of M2 (possibly admitting more behaviors than M2) that is a strong enough assumption for M1 to make in order to satisfy '. Our aim is to onstru t su h an assumption A to show that M1kM2 satis es '. This rule is sound and omplete [19℄. 3 Assumption Generation via Computational Learning Given a symboli module M = M1kM2 onsisting of two sub-modules and a safety property ', our aim is to verify that M satis es ' by nding an A that satis es the premises of the assume-guarantee rule explained in Se tion 2. Let us x a pair of su h modules M1 = (X1; XI 1 ; XO 1 ; Init1; T1) and M2 = (X2; XI 2 ; XO 2 ; Init2; T2) for the rest of this se tion. Let L1 be the set of all tra es = s0; s1; , where ea h si 2 SIO , su h that either 62 L(M1) or '(si) holds for all i 0. Thus, L1 is the largest language for M1's environment that an keep M1 safe. Note that the languages of the andidates for A that satisfy the rst premise of the proof rule is pre isely the set of all subsets of L1. Let L2 be the set of tra es ofM2, that is, L(M2). The languages of andidates for A that satisfy the se ond premise of the proof rule is pre isely the set of all supersets of L2. Sin e M1 and M2 are nite, it is easy to see that L1 and L2 are in fa t regular languages. Let B1 be the module orresponding to the minimum state DFA a epting L1. The problem of nding A satisfying both proof premises hen e redu es to he king for a language whi h is a superset of L2 and a subset of L1. To dis over su h an assumption A, our strategy is to onstru t A using a learning algorithm for regular languages, alled the L algorithm. The L algorithm is an algorithm for a learner trying to learn a xed unknown regular language U through membership queries and equivalen e queries. Membership queries ask whether a given string is in U . An equivalen e query asks whether a given language L(C) (presented as a DFA C) equals U ; if so, the tea her answers `yes' and the learner has learnt the language, and if not, the tea her provides a ounter-example whi h is a string that is in the symmetri di eren e of L(C) and U . We adapt the L algorithm to learn some language from a range of languages, namely to learn a language that is a superset of L2 and a subset of L1. We do not, of ourse, onstru t L1 or L2 expli itly, but instead answer queries using modelhe king queries performed on M1 and M2 respe tively. Given an equivalen e query with onje ture L(C), the test for equivalen e an be split into two| he king the subset query L(C) U and he king the superset query L(C) U . To he k the subset query, we he k if L(C) L1, and to he k the superset query we he k whether L(C) L2. If these two tests pass, then we de lare that the learner has indeed learnt the language as the onje ture is an adequate assumption. The membership query is more ambiguous to handle. When the learner asks whether a word w is in U , if w is not in L1, then we an learly answer in the negative, and if w is in L2 then we an answer in the aÆrmative. However, if w is in L1 but not in L2, then answering either positively or negatively an rule out ertain andidates for A. In this paper, the strategy we have hosen is to always answer membership queries with respe t to L1. It is possible to explore alternative strategies that involve L2 also. generating C Yes/No Partitioning information (M1kM2) M;' M1kC j= ' No M1kM2 j= ' M1k j= ' Yes; C No; 2 L(M2) n L(C) M2 v C M1kM2 6j= ' is a ounter-example. Yes Yes; No; ex equiv(C) memb( ) L algorithm M1k j= ' Fig. 1. Overview of ompositional veri ation by learning assumptions Figure 1 illustrates the high-level overview of our ompositional veri ation pro edure. Membership queries are answered by he king safety with respe t to M1. To answer the equivalen e query, we rst he k the subset query (by a safety he k with respe t toM1); if the query fails, we return the ounterexample found to L . If the subset query passes, then we he k for the superset query by he king re nement with respe t to M2. If this superset query also passes, then we de lare M satis es ' sin e C satis es both premises of the proof rule. Otherwise, we he k if the ounter-example tra e (whi h is a behavior of M2 but not in L(C)) keepsM1 safe. If it does not, we on lude thatM1kM2 does not satisfy '; otherwise, we give ba k to the L algorithm as a ounter-example to the superset query. One of the ni e properties of the L algorithm is that it takes time polynomial in the size of the minimal automaton a epting the learnt language (and polynomial in the lengths of the ounter-examples provided by the tea her). Let us now estimate bounds on the size of the automaton onstru ted by our algorithm, and simultaneously show that our pro edure always terminates. Note that all membership queries and all ounter-examples provided by the tea her in our algorithm are onsistent with respe t to L1 (membership and subset queries are resolved using L1 and ounter-examples to superset queries, though derived using M2, are he ked for onsisten y with L1 before it is passed to the learner). Now, if M1kM2 does indeed satisfy ', then L2 is a subset of L1 and hen e B1 is an adequate assumption that witnesses the fa t that M1kM2 satis es '. If M1kM2 does not satisfy ', then L2 is not a subset of L1. Again B1 is an adequate automaton whi h if learnt will show that M1kM2 does not satisfy ' (sin e this assumption when he ked with M2, will result in a run whi h is exhibited by M2 but not in L1, and hen e not safe with respe t to M1). Hen e B1 is an adequate automaton to learn in both ases to answer the modelhe king question, and all answers to queries are onsistent with B1. The L algorithm has the property that the automata it onstru ts monotoni ally grow with ea h iteration in terms of the number of states, and are always min1: R := f"g; E := f"g; 2: forea h (a 2 ) f G["; "℄ := member(" "); G[" a; "℄ := member(" a "); g 3: repeat: 4: while ((rnew := losed(R;E;G)) 6= null) f 5: add(R; rnew ); 6: forea h (a 2 ); (e 2 E) f G[rnew a; e℄ := member(rnew a e); g 7: g 8: C := makeConje tureMa hine(R;E;G); 9: if (( ex := equivalent(C)) = null) then return C; 10: else f 11: enew := ndSuÆx ( ex); 12: add(E; enew ); 13: forea h (r 2 R); (a 2 ) f 14: G[r; enew ℄ := member(r enew ); G[r a; enew ℄ := member(r a enew); 15: g g Fig. 2. L algorithm imal. Consequently, we are assured that our pro edure will not onstru t any automaton larger than B1. Hen e our pro edure always halts and reports orre tly whether M1kM2 satis es ', and in doing so, it never generates any assumption with more states than the minimal DFA a epting L1. 4 Symboli implementation of L algorithm 4.1 L algorithm The L algorithm learns an unknown regular language and generates a minimal DFA that a epts the regular language. This algorithm was introdu ed by Angluin [6℄, but we use an improved version by Rivest and S hapire [21℄. The algorithm infers the stru ture of the DFA by asking a tea her, who knows the unknown language, membership and equivalen e queries. Figure 2 illustrates the improved version of L algorithm [21℄. Let U be the unknown regular language and be its alphabet. At any given time, the L algorithm has, in order to onstru t a onje ture ma hine, information about a nite olle tion of strings over , lassi ed either as members or non-members of U . This information is maintained in an observation table (R;E;G) where R and E are sets of strings over , and G is a fun tion from (R[R ) E to f0; 1g. More pre isely, R is a set of representative strings for states in the DFA su h that ea h representative string rq 2 R for a state q leads from the initial state (uniquely) to the state q, and E is a set of experiment suÆx strings that are used to distinguish states (for any two states of the automaton being built, there is a string in E whi h is a epted from one and reje ted from the other). G maps strings in (R[R ) E to 1 if is in U , and to 0 otherwise. Initially, R and E are set to f"g, and G is initialized using membership queries for every string in (R[R ) E (line 2). In line 4, it he ks whether the observation table is losed. The fun tion losed(R, E, G) returns null (meaning true) if for every r 2 R and a 2 , there exists r0 2 R su h that G[r a; e℄ = G[r0; e℄ for every e 2 E; otherwise, it returns r a su h that there is no r0 satisfying the above ondition. If the table is not losed, ea h su h r a (e.g., rnew is r a in line 5) is simply added to R. The algorithm again updates G with regard to r a (line 6). On e the table is losed, it onstru ts a onje ture DFA C = (Q; q0; F; Æ) as follows (line 8): Q = R, q0 = ", F = fr 2 R j G[r; "℄ = 1g, and for every r 2 R and a 2 , Æ(r; a) = r0 su h that G[r a; e℄ = G[r0; e℄ for every e 2 E. Finally, if the answer for the equivalen e query is `yes', it returns the urrent onje ture ma hine C; otherwise, a ounter-example ex 2 ((L(C) n U) [ (U n L(C)) is provided by the tea her. The algorithm analyzes the ounter-example ex in order to nd the longest suÆx enew of ex that witnesses a di eren e between U and L(C) (line 14). Intuitively, the urrent onje ture ma hine has guessed wrong sin e this point. Adding enew to E re e ts the di eren e in the next onje ture by splitting states in C. It then updates G with respe t to enew . The L algorithm is guaranteed to onstru t a minimal DFA for the unknown regular language using only O(j jn2+n logm) membership queries and at most n 1 equivalen e queries, where n is the number of states in the nal DFA and m is the length of the longest ounter-example provided by the tea her for equivalen e queries. As we dis ussed in Se tion 3, we use the L algorithm to identify A(XA; XI A; XO A ; InitA; TA) satisfying the premises of the proof rule, where XIO A = XIO . A is hen e a language over the alphabet SIO , and the L algorithm an learn A in time polynomial in the size of A (and the ounter-examples). However, when we apply the L algorithm to analyze a large module (espe ially when the number of input and output variables is large), the large alphabet size poses many problems: (1) the onstru ted DFA has too many edges when represented expli itly, (2) the size of the observation table, whi h is polynomial in and the size of the onje tured automaton, gets very large, and (3) the number of membership queries needed to ll ea h entry in the observation table also in reases. To resolve these problems, we present a symboli implementation of the L algorithm. 4.2 Symboli implementation For des ribing our symboli implementation for the L algorithm, we rst explain the essential data stru tures the algorithm needs, and then present our impli it data stru tures orresponding to them. The L algorithm uses the following data stru tures: { string R[int℄: ea h R[i℄ is a representative string for i-th state qi in the onje ture DFA. { string E[int℄: ea h E[i℄ is i-th experiment string. { boolean G1[int℄[int℄: ea h G1[i℄[j℄ is the result of the membership query for R[i℄ E[j℄. { boolean G2[int℄[int℄[int℄: ea h G2[i℄[j℄[k℄ is the result of the membership query for R[i℄ aj E[k℄ where aj is the j-th alphabet symbol in . Note that G of the observation table is split into two arrays, G1 and G2, where G1 is an array for a fun tion from R E to f0; 1g and G2 is for a fun tion from R E to f0; 1g. The L algorithm initializes the data stru tures as following: R[0℄=E[0℄=", G1[0℄[0℄=member (" "), and G2[0℄[i℄[0℄=member (" ai ") (for every ai 2 ). On e it introdu es a new state or a new experiment, it adds to R[℄ or E[℄ and updates G1 and G2 by membership queries. These arrays also en ode the edges of the onje ture ma hine: there is an edge from state qi to qj on ak when G2[i℄[k℄[l℄=G1[j℄[l℄ for every l. For symboli implementation, we do not wish to onstru t G2 in order to onstru t onje ture DFAs by expli it membership queries sin e j j is too large. While the expli it L algorithm asks for ea h state r, alphabet symbol a and experiment e, if r a e is a member, we ompute, given a state r and a boolean ve tor v, the set of alphabet symbols a su h that for every j jvj, member(r a ej) = v[j℄. For this, we have the following data stru tures: { int nQ: the number of states in the urrent DFA. { int nE: the number of experiment strings. { BDD R[int℄: ea h R[i℄ (0 i < nQ) is a BDD over X1 to represent the set of states of the module M1 that are rea hable from an initial state of M1 by the representative string ri of the i-th state qi: postImage(Init1(X1); ri). { BDD E[int℄: ea h E[i℄ (0 i < nE) is a BDD over X1 to apture a set of states of M1 from whi h some state violating ' is rea hable by the i-th experiment string ei: preImage(:'(X1); ei). { booleanVe tor G1[int℄: Ea h G1[i℄ (0 i < nQ) is the boolean ve tor for the state qi, where the length of ea h boolean ve tor always equals to nE. Note that as nE is in reased, the length of ea h boolean ve tor is also in reased. For i 6= j, G1[i℄ 6= G1[j℄. Ea h element G1[i℄[j℄ of G1[i℄ (0 j < nE) represents whether ri ej is a member where ri is a representative string for R[i℄ and ej is an experiment string for E[j℄: whether R[i℄ and E[j℄ have empty interse tion. { booleanVe tor Cd[int℄: every iteration of the L algorithm splits some states of the urrent onje ture DFA by a new experiment string. More pre isely, the new experiment splits every state into two state andidates, and among them, only rea hable ones are onstru ted as states of the next onje ture DFA. The Cd[℄ ve tor des ribes all these state andidates and ea h element is the boolean ve tor of ea h andidate. jCdj = 2 nQ. Given M =M1kM2 and ', we initialize the data stru tures as follows. R[0℄ is the BDD for Init1(X1) and E[0℄ is the BDD for :' sin e the orresponding representative and experiment string are ", and G1[0℄[0℄ = 1 sin e we assume that every initial state satis es '. In addition, we have the following fun tions that manipulate the above data stru tures for implementing the L algorithm impli itly (Figure 3 illustrates the pseudoode for the important ones.): BDD edges(int i, booleanVe tor v)f BDD eds := true; // eds is a BDD over XIO . forea h (0 j < nE)f // In the below, XL 1 = X1 nXIO . if (v[j℄) then eds := eds ^ :(9XL 1 ; X10: R[i℄(X1)^T1(X1;XI 1 ; X 0 1)^E[j℄(X 0 1)); else eds := eds ^ (9XL 1 ;X10: R[i℄(X1) ^ T1(X1;XI 1 ;X 0 1) ^ E[j℄(X 0 1)); greturn eds; gvoid addR(int i, BDD b, booleanVe tor v)f BDD io := pi kOneState(b); // io is a BDD representing one alphabet symbol. R[nQ℄ := (9X1;XI 1 : R[i℄(X1) ^ io ^ T1(X1; XI 1 ;X 0 1))[X 0 1 ! X1℄; G1[nQ++℄ := v; gvoid addE(BDD[℄ bs)f BDD b := '; // b is a BDD over X1. for (j := length(bs); j > 0; j--) f b := 9XI 1 ;X 0 1: b(X 0 1) ^ bs[j℄ ^ T1(X1;XI 1 ; X 0 1); g E[nE℄ := :b; forea h (0 i < nQ) f if ((R[i℄ ^ E[nE℄) = false) then G1[i℄[nE℄ := 1; else G1[i℄[nE℄ := 0; forea h (0 j < nE) f Cd[2i℄[j℄ := G1[i℄[j℄; Cd[2i+ 1℄[j℄ := G1[i℄[j℄; g Cd[2i℄[nE℄ := 0; Cd[2i+ 1℄[nE℄ := 1; gnE++; g Fig. 3. Symboli implementation of observation table { BDD edges(int, booleanVe tor): this fun tion, given an integer i and a boolean ve tor v (0 i < nQ, jvj = nE), returns a BDD over XIO representing the set of alphabet symbols by whi h there is an edge from state qi to a state that has v as its boolean ve tor. { void addR(int, BDD, booleanVe tor): when we introdu e a new state (whose prede essor state is qi, the BDD representing edges from qi is b and the boolean ve tor is v), addR(i, b, v) updates R, G1 and nQ. { void addE(BDD[℄): given a new experiment string represented as an array of BDDs (where ea h BDD of the array en odes the orresponding state in the experiment string), this fun tion updates E, G1 and nE. It also onstru ts a new set Cd[℄ of state andidates for the next iteration. { boolean isInR(booleanVe tor): given a boolean ve tor v, isInR(v) heks whether v = G1[i℄ for some i. { BDD[℄ findSuffix(BDD[℄): given a ounter-example ex (from equivalen e queries) represented by a BDD array, findSuffix( ex) returns a BDD array representing the longest suÆx that witnesses the di eren e between the onje ture DFA and A. While the L algorithm onstru ts a onje ture ma hine by omputing G2 and omparing between G1 and G2, we dire tly make a symboli onje ture DFA C(XC ; XIO ; InitC ; FC ; TC) with the following omponents: { XC is a set of boolean variables representing states in C (jXC j = dlog2nQe). Valuations of the variables an be en oded from its index for R. { XIO is a set of boolean variables de ning its alphabet, whi h omes from M1 and M2. { InitC(XC) is an initial state predi ate over XC . InitC(XC) is en oded from the index of the state q0: InitC(XC) = Vx2XC (x 0). { FC(XC) is a predi ate for a epting states. It is en oded from the indi es of the states qi su h that G1[i℄[0℄=1. { TC(XC ; XIO ; X 0 C) is a transition predi ate over XC [XIO [X 0 C ; that is, if TC(i; a; j) = true, then the DFA has an edge from state qi to qj labeled by a. To get this predi ate, we ompute a set of edges from every state qi to every state andidate with boolean ve tor v by alling edges(i, v). This symboli DFA C(XC ; XIO ; InitC ; FC ; TC) an be easily onverted to a symboli moduleMC(XC ; XI ; XO; InitC ; TC). Now, we an onstru t a symboli onje ture DFA C using impli it membership queries by edges(). In addition, we have the following fun tions for equivalen e queries: { BDD[℄ subsetQ(Symboli DFA): our subset query is to he k whether all strings allowed by C make M1 stay in states satisfying '. Hen e, given a symboli DFA C(XC ; XIO ; InitC ; FC ; TC), we he k M1kMC j= (FC ! ') by rea hability he king, whereMC is a symboli module onverted from C. If so, it returns null ; otherwise, it returns a BDD array as a ounter-example. { BDD[℄ supersetQ(Symboli DFA): it he ks that M2 v C. The return value is similar with subsetQ(). Sin e C is again a (symboli ) DFA, we an simply implement it by symboli rea hability omputation for the produ t of M2 and MC . If it rea hes the non-a epting state of C, the sequen e rea hing the non-a epting state is a witness showing M2 6v C. { boolean safeM1(BDD [℄): given a string represented by a BDD array, it exe utes M1 a ording to . If the exe ution rea hes a state violating ', it returns false; otherwise, returns true. Figure 4 illustrates our symboli ompositional veri ation (SCV) algorithm. We initialize nQ, nE, R, E, G1, Cd and C in lines 1{3. We then ompute a set of edges (a BDD) from every sour e state qi to every state andidate with boolean ve tor Cd[j℄. On e we rea h a new state, we update R, nQ and G1 by addR() (line 9). This step makes the onje ture ma hine losed. If we have a non-empty edge set by edges(), then we update the onje ture C (line 10). After onstru ting a onje ture DFA, we ask an equivalen e query as dis ussed in Se tion 3 (lines 12{15). If we annot on lude true nor false from the query, we are provided a ounter-example from the tea her and get a new experiment string from the ounter-example. E, nE, Cd and G1 are then updated based on the new experiment string. We implement this algorithm with the BDD pa kage in a symboli model he ker NuSMV. boolean SCV(M1;M2; ') 1: nQ := 1; nE := 1; R[0℄ := Init1(X1); E[0℄ := :'; 2: G1[0℄[0℄ := 1; Cd[0℄ := 0; Cd[1℄ := 1; 3: C := initializeC (); 4: repeat: 5: forea h (0 i < nQ) f 6: forea h (0 j < 2 nQ) f 7: eds := edges(i, Cd[j℄); 8: if (eds 6= false) then f 9: if (:isInR(Cd[j℄)) then addR(i, eds, Cd[j℄); 10: C := updateC (i ; eds; indexofR(Cd[j℄)); 11: g g g 12: if (( ex := subsetQ(C)) = null) then f 13: if (( ex := supersetQ(C) = null) then return true; 14: else if (:safeM1( ex)) then return false; 15: g 16: addE(findSuffix( ex)); Fig. 4. Symboli ompositional veri ation algorithm 5 Experiments We rst explain an arti ial example ( alled `simple') to illustrate our method and then report results on `simple' and four examples from the NuSMV pa kage. Example: simple Module M1 has a variable x (initially set to 0 and updated by the rule x0 := y in ea h round where y is an input variable) and a dummy array that does not a e t x at all. Module M2 has a variable y (initially set to 0 and is never updated) and also a dummy array that does not a e t y at all. For M1kM2, we want to he k that x is always 0. Both dummy arrays are from an example swap known to be hard for BDD en oding [18℄. Our tool explores M1 and M2 separately with a two-state assumption (whi h allows only y = 0), while ordinary model he kers will sear h whole state spa e of M1kM2. For some examples from the NuSMV pa kage, we slightly modi ed them beause our tool does not support the full syntax of the NuSMV language. The primary sele tion riterion was to in lude examples for whi h NuSMV takes a long time or fails to omplete. All experiments were performed on a Sun-Blade-1000 workstation using 1GB memory and SunOS 5.9. The results for the examples are shown in Table 1. We ompare our symboli ompositional veri ation tool (SCV) with the invariant he king (with early termination) of NuSMV 2.2.2. The table has the number of variables in total, in M1, in M2 and the number of input/output variables between the modules, exe ution time in se onds, the peak BDD size and the number of states in the assumption we learn (for SCV). Entries denoted `{' mean that a tool did not omplete within 2 hours. The results of simple are also shown in Table 1. For simple1 through simple4, we just in reased the size of dummy arrays from 8 to 11, and he ked example tot M1 M2 IO SCV NuSMV name spe var var var var time peak BDD assumption states time peak BDD simple1 69 36 33 4 19.2 607,068 2 269 3,993,976 simple2 true 78 41 37 5 106 828,842 2 4032 32,934,972 simple3 86 45 41 5 754 3,668,980 2 { { simple4 94 49 45 5 4601 12,450,004 2 { { guidan e1 false 135 24 111 23 124 686,784 20 { { guidan e2 true 122 24 98 22 196 1,052,660 2 { { guidan e3 true 122 58 64 46 357 619,332 2 { { barrel1 false 20.3 345,436 3 1201 28,118,286 barrel2 true 60 30 30 10 23.4 472,164 4 4886 36,521,170 barrel3 true { { too many { { msi1 45 26 19 25 2.1 289,226 2 157 1,554,462 msi2 true 57 26 31 25 37.0 619,332 2 3324 16,183,370 msi3 70 26 44 26 1183 6,991,502 2 { { robot1 false 92 8 84 12 1271 4,169,760 11 654 2,729,762 robot2 true 92 22 70 12 1604 2,804,368 42 1039 1,117,046 Table 1. Experimental results the same spe i ation. As we expe ted, SCV generated a 2-state assumption and performed signi antly better than NuSMV. The se ond example, guidan e, is a model of a spa e shuttle digital autopilot. We added redundant variables to M1 and M2 and did not use a given variable ordering information as both tools nished fast with the original model and the ordering. The spe i ations were pi ked from the given pool: guidan e1, guidan e2, guidan e3 have the same models but have di erent spe i ations. For guidan e1, our tool found a ounter-example with an assumption having 20 states (If this assumption had been expli itly onstru ted, the 23 I/O variables would have aused way too many edges to store expli itly). The third set, barrel, is an example for bounded model he king and no variable ordering works well for BDD-based tools. barrel1 has an invariant derived from the original, but barrel2 and barrel3 have our own ones. barrel1, barrel2 and barrel3 have the same model s aled-up from the original, but with di erent initial predi ates. The fourth set, msi, is a MSI a he proto ol model and shows how the tools s ale on a real example. We s aled-up the original model with 3 nodes: msi1 has 3 nodes, msi2 has 4 nodes and msi3 has 5 nodes. They have the same spe i ation that is related to only two nodes, and we xed the same omponent M1 in all of them. As the number of nodes grew, NuSMV required mu h more time and the BDD sizes grew more qui kly than in our tool. robot1 and robot2 are roboti s ontroller models and we again added redundant variables to M1 and M2, as in the ase of guidan e example. Even though SCV took more time, this example shows that SCV an be applied to models for whi h non-trivial assumptions are needed. More details about the examples are available at http://www. is.upenn.edu/ wnam/ av05/. Referen es 1. M. Abadi and L. Lamport. Conjoining spe i ations. ACM TOPLAS, 17:507{534, 1995. 2. R. Alur, P. Cerny, P. Madhusudan, and W. Nam. Synthesis of interfa e spe i ations for Java lasses. In Pro . 32nd ACM POPL, pages 98{109, 2005. 3. R. Alur, L. de Alfaro, T.A. Henzinger, and F. Mang. Automating modular veri ation. In CONCUR'99: Con urren y Theory, LNCS 1664, pages 82{97, 1999. 4. R. Alur and T.A. Henzinger. Rea tive modules. Formal Methods in System Design, 15(1):7{48, 1999. A preliminary version appears in Pro . 11th LICS, 1996. 5. R. Alur, T.A. Henzinger, F. Mang, S. Qadeer, S. Rajamani, and S. Tasiran. MOCHA: Modularity in model he king. In 10th CAV, pages 516{520, 1998. 6. D. Angluin. Learning regular sets from queries and ounterexamples. Information and Computation, 75:87{106, 1987. 7. H. Barringer, C.S. Pasareanu, and D. Giannakopolou. Proof rules for automated ompositional veri ation through learning. In Pro . 2nd SVCBS, 2003. 8. A. Biere, A. Cimatti, E. Clarke, and Y. Zhu. Symboli model he king without BDDs. In Pro . 5th TACAS, pages 193{207, 1999. 9. R.E. Bryant. Graph-based algorithms for boolean-fun tion manipulation. IEEE Transa tions on Computers, C-35(8):677{691, 1986. 10. A. Cimatti, E. Clarke, E. Giun higlia, F. Giun higlia, M. Pistore, M. Roveri, R. Sebastiani, and A. Ta hella. NuSMV Version 2: An OpenSour e Tool for Symboli Model Che king. In Pro . CAV 2002, LNCS 2404, pages 359{364, 2002. 11. E. Clarke, O. Grumberg, S. Jha, Y. Lu, and H. Veith. Counterexample-guided abstra tion re nement. In Computer Aided Veri ation, pages 154{169, 2000. 12. J.M. Cobleigh, D. Giannakopoulou, and C.S. Pasareanu. Learning assumptions for ompositional veri ation. In Pro . 9th TACAS, LNCS 2619, pages 331{346, 2003. 13. O. Gr umberg and D.E. Long. Model he king and modular veri ation. ACM Transa tions on Programming Languages and Systems, 16(3):843{871, 1994. 14. T.A. Henzinger, S. Qadeer, and S. Rajamani. You assume, we guarantee: Methodology and ase studies. In Pro . CAV 98, LNCS 1427, pages 521{525, 1998. 15. R.P. Kurshan. Computer-aided Veri ation of Coordinating Pro esses: the automata-theoreti approa h. Prin eton University Press, 1994. 16. K.L. M Millan. Symboli model he king. Kluwer A ademi Publishers, 1993. 17. K.L. M Millan. A ompositional rule for hardware design re nement. In CAV 97: Computer-Aided Veri ation, LNCS 1254, pages 24{35, 1997. 18. K.L. M Millan. Applying SAT methods in unbounded symboli model he king. In Pro . 14th Computer Aided Veri ation, LNCS 2404, pages 250{264, 2002. 19. K.S. Namjoshi and R.J. Tre er. On the ompleteness of ompositional reasoning. In CAV 2000: Computer-Aided Veri ation, LNCS 1855, pages 139{153, 2000. 20. D. Peled, M.Y. Vardi and M. Yannakakis. Bla k box he king. Journal of Automata, Languages and Combinatori s, 7(2): 225-246, 2002. 21. R.L. Rivest and R.E. S hapire. Inferen e of nite automata using homing sequen es. Information and Computation, 103(2):299{347, 1993. 22. E.W. Stark. A proof te hnique for rely-guarantee properties. In FST & TCS 85, LNCS 206, pages 369{391, 1985. 23. A. Vardhan, K. Sen, M. Viswanathan, and G. Agha. A tively learning to verify safety properties for FIFO automata. In Pro . 24th FSTTCS, pages 494{505, 2004.
منابع مشابه
Automatic symbolic compositional verification by learning assumptions
Compositional reasoning aims to improve scalability of verification tools by reducing the original verification task into subproblems. The simplification is typically based on assume-guarantee reasoning principles, and requires user guidance to identify appropriate assumptions for components. In this paper, we propose a fully automated approach to compositional reasoning that consists of automa...
متن کاملOn learning assumptions for compositional verification of probabilistic systems
Probabilistic model checking is a powerful formal verification method that can ensure the correctness of real-life systems that exhibit stochastic behaviour. The work presented in this thesis aims to solve the scalability challenge of probabilistic model checking, by developing, for the first time, fully-automated compositional verification techniques for probabilistic systems. The contribution...
متن کاملLearning-Based Symbolic Assume-Guarantee Reasoning with Automatic Decomposition
Compositional reasoning aims to improve scalability of verification tools by reducing the original verification task into subproblems. The simplification is typically based on the assume-guarantee reasoning principles, and requires decomposing the system into components as well as identifying adequate environment assumptions for components. One recent approach to automatic derivation of adequat...
متن کاملLearning Minimal Requirements for Compositional Verification
Compositional verification is a technique aimed at addressing the state explosion problem associated with model checking. One approach to compositional verification is assume-guarantee reasoning, in which the verification of components of a system allows properties of the whole system to be checked by using assumptions derived from one component in the verification of a second component. Once s...
متن کاملAutomated Compositional Analysis for Checking Component Substitutability
Model checking is an automated technique to verify hardware and software systems formally. Most of the model checking research has focused on developing scalable techniques for verifying large systems. A number of techniques, e.g., symbolic methods, abstractions, compositional reasoning, etc. have been proposed towards this goal. While methods based on symbolic reasoning (using binary decision ...
متن کاملRefining Interface Alphabets for Compositional Verification
Techniques for learning automata have been adapted to automatically infer assumptions in assume-guarantee compositional verification. Learning, in this context, produces assumptions and modifies them using counterexamples obtained by model checking components separately. In this process, the interface alphabets between components, that constitute the alphabets of the assumption automata, are fi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005